Estimating effect size across datasets
نویسنده
چکیده
Most NLP tools are applied to text that is different from the kind of text they were evaluated on. Common evaluation practice prescribes significance testing across data points in available test data, but typically we only have a single test sample. This short paper argues that in order to assess the robustness of NLP tools we need to evaluate them on diverse samples, and we consider the problem of finding the most appropriate way to estimate the true effect size across datasets of our systems over their baselines. We apply meta-analysis and show experimentally – by comparing estimated error reduction over observed error reduction on held-out datasets – that this method is significantly more predictive of success than the usual practice of using macroor micro-averages. Finally, we present a new parametric meta-analysis based on nonstandard assumptions that seems superior to standard parametric meta-analysis.
منابع مشابه
Influence of Outliers on Accuracy Estimation in Genomic Prediction in Plant Breeding
Outliers often pose problems in analyses of data in plant breeding, but their influence on the performance of methods for estimating predictive accuracy in genomic prediction studies has not yet been evaluated. Here, we evaluate the influence of outliers on the performance of methods for accuracy estimation in genomic prediction studies using simulation. We simulated 1000 datasets for each of 1...
متن کاملGenre Analysis and Genre-mixing Across Various Realizations of Academic Book Introductions in Applied Linguistics
Motivated by the need to explore the introductory sections of textbooks, the present study attempted to scrutinize three realizations of academic introductions, namely, Preface, Introduction, and Foreword in terms of their functions and potential generic structures in light of Swales’s (1990) views of genre. Moreover, the study aimed to investigate genre-mixing as an interdiscursivity element a...
متن کاملA New Architecture Based on Artificial Neural Network and PSO Algorithm for Estimating Software Development Effort
Software project management has always faced challenges that have often had a great impact on the outcome of projects in future. For this, Managers of software projects always seek solutions against challenges. The implementation of unguaranteed approaches or mere personal experiences by managers does not necessarily suffice for solving the problems. Therefore, the management area of software p...
متن کاملPrognostic effect size of cardiovascular biomarkers in datasets fromobservational studies versus randomised trials: meta-epidemiology study OPEN ACCESS
Objective To compare the reported effect sizes of cardiovascular biomarkers in datasets from observational studies with those in datasets from randomised controlled trials. Design Review of meta-analyses. Study selectionMeta-analyses of emerging cardiovascular biomarkers (not part of the Framingham risk score) that included datasets from at least one observational study and at least one randomi...
متن کاملPrognostic effect size of cardiovascular biomarkers in datasets from observational studies versus randomised trials: meta-epidemiology study
OBJECTIVE To compare the reported effect sizes of cardiovascular biomarkers in datasets from observational studies with those in datasets from randomised controlled trials. DESIGN Review of meta-analyses. STUDY SELECTION Meta-analyses of emerging cardiovascular biomarkers (not part of the Framingham risk score) that included datasets from at least one observational study and at least one ra...
متن کامل